Overview

Dataset statistics

Number of variables11
Number of observations886
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory58.1 KiB
Average record size in memory67.1 B

Variable types

Numeric6
Categorical5

Alerts

df_index is highly correlated with PassengerIdHigh correlation
PassengerId is highly correlated with df_indexHigh correlation
Survived is highly correlated with Sex_maleHigh correlation
Pclass is highly correlated with FareHigh correlation
Fare is highly correlated with PclassHigh correlation
Embarked_Q is highly correlated with Embarked_SHigh correlation
Embarked_S is highly correlated with Embarked_QHigh correlation
Sex_male is highly correlated with SurvivedHigh correlation
df_index is highly correlated with PassengerIdHigh correlation
PassengerId is highly correlated with df_indexHigh correlation
Survived is highly correlated with Sex_maleHigh correlation
Pclass is highly correlated with FareHigh correlation
Fare is highly correlated with PclassHigh correlation
Embarked_Q is highly correlated with Embarked_SHigh correlation
Embarked_S is highly correlated with Embarked_QHigh correlation
Sex_male is highly correlated with SurvivedHigh correlation
df_index is highly correlated with PassengerIdHigh correlation
PassengerId is highly correlated with df_indexHigh correlation
Survived is highly correlated with Sex_maleHigh correlation
Pclass is highly correlated with FareHigh correlation
Fare is highly correlated with PclassHigh correlation
Embarked_Q is highly correlated with Embarked_SHigh correlation
Embarked_S is highly correlated with Embarked_QHigh correlation
Sex_male is highly correlated with SurvivedHigh correlation
Sex_male is highly correlated with SurvivedHigh correlation
Survived is highly correlated with Sex_maleHigh correlation
df_index is highly correlated with PassengerIdHigh correlation
PassengerId is highly correlated with df_indexHigh correlation
Survived is highly correlated with Sex_maleHigh correlation
Pclass is highly correlated with FareHigh correlation
SibSp is highly correlated with ParchHigh correlation
Parch is highly correlated with SibSpHigh correlation
Fare is highly correlated with PclassHigh correlation
Embarked_Q is highly correlated with Embarked_SHigh correlation
Embarked_S is highly correlated with Embarked_QHigh correlation
Sex_male is highly correlated with SurvivedHigh correlation
df_index is uniformly distributed Uniform
PassengerId is uniformly distributed Uniform
df_index has unique values Unique
PassengerId has unique values Unique
SibSp has 603 (68.1%) zeros Zeros
Parch has 674 (76.1%) zeros Zeros

Reproduction

Analysis started2022-06-19 05:57:11.989948
Analysis finished2022-06-19 05:57:28.828808
Duration16.84 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
UNIFORM
UNIQUE

Distinct886
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean444.6173815
Minimum0
Maximum890
Zeros1
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size7.0 KiB
2022-06-19T11:27:29.029019image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile44.25
Q1222.25
median444.5
Q3665.75
95-th percentile845.75
Maximum890
Range890
Interquartile range (IQR)443.5

Descriptive statistics

Standard deviation257.0487858
Coefficient of variation (CV)0.578134811
Kurtosis-1.195429246
Mean444.6173815
Median Absolute Deviation (MAD)222
Skewness0.002397173832
Sum393931
Variance66074.0783
MonotonicityStrictly increasing
2022-06-19T11:27:29.310314image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01
 
0.1%
5971
 
0.1%
5861
 
0.1%
5871
 
0.1%
5881
 
0.1%
5891
 
0.1%
5901
 
0.1%
5911
 
0.1%
5921
 
0.1%
5931
 
0.1%
Other values (876)876
98.9%
ValueCountFrequency (%)
01
0.1%
11
0.1%
21
0.1%
31
0.1%
41
0.1%
51
0.1%
61
0.1%
71
0.1%
81
0.1%
91
0.1%
ValueCountFrequency (%)
8901
0.1%
8891
0.1%
8881
0.1%
8871
0.1%
8861
0.1%
8851
0.1%
8841
0.1%
8831
0.1%
8821
0.1%
8811
0.1%

PassengerId
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
UNIFORM
UNIQUE

Distinct886
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean445.6173815
Minimum1
Maximum891
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.0 KiB
2022-06-19T11:27:29.642955image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile45.25
Q1223.25
median445.5
Q3666.75
95-th percentile846.75
Maximum891
Range890
Interquartile range (IQR)443.5

Descriptive statistics

Standard deviation257.0487858
Coefficient of variation (CV)0.5768374316
Kurtosis-1.195429246
Mean445.6173815
Median Absolute Deviation (MAD)222
Skewness0.002397173832
Sum394817
Variance66074.0783
MonotonicityStrictly increasing
2022-06-19T11:27:29.980888image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
11
 
0.1%
5981
 
0.1%
5871
 
0.1%
5881
 
0.1%
5891
 
0.1%
5901
 
0.1%
5911
 
0.1%
5921
 
0.1%
5931
 
0.1%
5941
 
0.1%
Other values (876)876
98.9%
ValueCountFrequency (%)
11
0.1%
21
0.1%
31
0.1%
41
0.1%
51
0.1%
61
0.1%
71
0.1%
81
0.1%
91
0.1%
101
0.1%
ValueCountFrequency (%)
8911
0.1%
8901
0.1%
8891
0.1%
8881
0.1%
8871
0.1%
8861
0.1%
8851
0.1%
8841
0.1%
8831
0.1%
8821
0.1%

Survived
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size7.0 KiB
0
549 
1
337 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters886
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row1
3rd row1
4th row1
5th row0

Common Values

ValueCountFrequency (%)
0549
62.0%
1337
38.0%

Length

2022-06-19T11:27:30.265486image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-06-19T11:27:30.536524image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
0549
62.0%
1337
38.0%

Most occurring characters

ValueCountFrequency (%)
0549
62.0%
1337
38.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number886
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0549
62.0%
1337
38.0%

Most occurring scripts

ValueCountFrequency (%)
Common886
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0549
62.0%
1337
38.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII886
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0549
62.0%
1337
38.0%

Pclass
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size7.0 KiB
3
491 
1
211 
2
184 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters886
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3
2nd row1
3rd row3
4th row1
5th row3

Common Values

ValueCountFrequency (%)
3491
55.4%
1211
23.8%
2184
 
20.8%

Length

2022-06-19T11:27:30.757542image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-06-19T11:27:31.016496image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
3491
55.4%
1211
23.8%
2184
 
20.8%

Most occurring characters

ValueCountFrequency (%)
3491
55.4%
1211
23.8%
2184
 
20.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number886
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3491
55.4%
1211
23.8%
2184
 
20.8%

Most occurring scripts

ValueCountFrequency (%)
Common886
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3491
55.4%
1211
23.8%
2184
 
20.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII886
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3491
55.4%
1211
23.8%
2184
 
20.8%

Age
Real number (ℝ)

Distinct88
Distinct (%)9.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.16531082 × 10-16
Minimum-2.222022385
Maximum3.901957659
Zeros0
Zeros (%)0.0%
Negative561
Negative (%)63.3%
Memory size7.0 KiB
2022-06-19T11:27:31.311534image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum-2.222022385
5-th percentile-1.792620416
Q1-0.5613602919
median-0.09963774526
Q30.4390385591
95-th percentile1.901159957
Maximum3.901957659
Range6.123980043
Interquartile range (IQR)1.000398851

Descriptive statistics

Standard deviation1.000564812
Coefficient of variation (CV)4.620883076 × 1015
Kurtosis1.003223995
Mean2.16531082 × 10-16
Median Absolute Deviation (MAD)0.4617225466
Skewness0.5120820931
Sum2.011724121 × 10-13
Variance1.001129944
MonotonicityNot monotonic
2022-06-19T11:27:31.604669image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-0.09963774526202
22.8%
-0.407452776330
 
3.4%
-0.561360291927
 
3.0%
-0.86917532326
 
2.9%
-0.792221565225
 
2.8%
0.0542697702825
 
2.8%
-0.638314049724
 
2.7%
-0.330499018623
 
2.6%
0.515992316921
 
2.4%
-0.0226839874920
 
2.3%
Other values (78)463
52.3%
ValueCountFrequency (%)
-2.2220223851
 
0.1%
-2.2027839451
 
0.1%
-2.1966276452
 
0.2%
-2.1904713442
 
0.2%
-2.1835455061
 
0.1%
-2.1773892057
0.8%
-2.10043544710
1.1%
-2.023481696
0.7%
-1.94652793210
1.1%
-1.8695741744
 
0.5%
ValueCountFrequency (%)
3.9019576591
 
0.1%
3.4402351121
 
0.1%
3.2093738392
0.2%
3.170896961
 
0.1%
3.1324200812
0.2%
2.824605051
 
0.1%
2.7476512923
0.3%
2.6706975342
0.2%
2.5937437772
0.2%
2.5167900193
0.3%

SibSp
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct7
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5259593679
Minimum0
Maximum8
Zeros603
Zeros (%)68.1%
Negative0
Negative (%)0.0%
Memory size7.0 KiB
2022-06-19T11:27:31.855031image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile3
Maximum8
Range8
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.105151233
Coefficient of variation (CV)2.101210284
Kurtosis17.77682411
Mean0.5259593679
Median Absolute Deviation (MAD)0
Skewness3.684609408
Sum466
Variance1.221359248
MonotonicityNot monotonic
2022-06-19T11:27:32.071316image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0603
68.1%
1209
 
23.6%
228
 
3.2%
418
 
2.0%
316
 
1.8%
87
 
0.8%
55
 
0.6%
ValueCountFrequency (%)
0603
68.1%
1209
 
23.6%
228
 
3.2%
316
 
1.8%
418
 
2.0%
55
 
0.6%
87
 
0.8%
ValueCountFrequency (%)
87
 
0.8%
55
 
0.6%
418
 
2.0%
316
 
1.8%
228
 
3.2%
1209
 
23.6%
0603
68.1%

Parch
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct7
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.3826185102
Minimum0
Maximum6
Zeros674
Zeros (%)76.1%
Negative0
Negative (%)0.0%
Memory size7.0 KiB
2022-06-19T11:27:32.297942image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2
Maximum6
Range6
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.807655689
Coefficient of variation (CV)2.110864131
Kurtosis9.734674532
Mean0.3826185102
Median Absolute Deviation (MAD)0
Skewness2.744455451
Sum339
Variance0.6523077119
MonotonicityNot monotonic
2022-06-19T11:27:32.495954image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0674
76.1%
1117
 
13.2%
280
 
9.0%
55
 
0.6%
35
 
0.6%
44
 
0.5%
61
 
0.1%
ValueCountFrequency (%)
0674
76.1%
1117
 
13.2%
280
 
9.0%
35
 
0.6%
44
 
0.5%
55
 
0.6%
61
 
0.1%
ValueCountFrequency (%)
61
 
0.1%
55
 
0.6%
44
 
0.5%
35
 
0.6%
280
 
9.0%
1117
 
13.2%
0674
76.1%

Fare
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct246
Distinct (%)27.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.383393024 × 10-16
Minimum-0.7407918193
Maximum5.653180587
Zeros0
Zeros (%)0.0%
Negative659
Negative (%)74.4%
Memory size7.0 KiB
2022-06-19T11:27:32.873303image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum-0.7407918193
5-th percentile-0.5651399158
Q1-0.5488316394
median-0.3893859032
Q30.004284656837
95-th percentile1.954967726
Maximum5.653180587
Range6.393972406
Interquartile range (IQR)0.5531162962

Descriptive statistics

Standard deviation1.000564812
Coefficient of variation (CV)7.232686554 × 1015
Kurtosis12.13091898
Mean1.383393024 × 10-16
Median Absolute Deviation (MAD)0.1678527159
Skewness3.204136704
Sum1.151301277 × 10-13
Variance1.001129944
MonotonicityNot monotonic
2022-06-19T11:27:33.174136image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-0.54508277843
 
4.9%
-0.424739951442
 
4.7%
-0.548831639438
 
4.3%
-0.552376282734
 
3.8%
-0.108688083431
 
3.5%
-0.485519156724
 
2.7%
-0.548121738318
 
2.0%
-0.551768490616
 
1.8%
-0.565037806715
 
1.7%
-0.740791819315
 
1.7%
Other values (236)610
68.8%
ValueCountFrequency (%)
-0.740791819315
1.7%
-0.64324119471
 
0.1%
-0.61923340861
 
0.1%
-0.58914770191
 
0.1%
-0.58428536551
 
0.1%
-0.58398146951
 
0.1%
-0.58286799442
 
0.2%
-0.57668796482
 
0.2%
-0.57405500961
 
0.1%
-0.57182562841
 
0.1%
ValueCountFrequency (%)
5.6531805874
0.5%
5.6379857852
0.2%
5.2768551962
0.2%
4.7907236624
0.5%
4.6510335991
 
0.1%
4.4011289561
 
0.1%
4.3971783083
0.3%
3.2673949892
0.2%
2.9901397033
0.3%
2.9436436114
0.5%

Embarked_Q
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size7.0 KiB
0
809 
1
 
77

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters886
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0809
91.3%
177
 
8.7%

Length

2022-06-19T11:27:33.479735image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-06-19T11:27:33.781805image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
0809
91.3%
177
 
8.7%

Most occurring characters

ValueCountFrequency (%)
0809
91.3%
177
 
8.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number886
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0809
91.3%
177
 
8.7%

Most occurring scripts

ValueCountFrequency (%)
Common886
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0809
91.3%
177
 
8.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII886
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0809
91.3%
177
 
8.7%

Embarked_S
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size7.0 KiB
1
644 
0
242 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters886
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row0
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1644
72.7%
0242
 
27.3%

Length

2022-06-19T11:27:34.011875image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-06-19T11:27:34.248017image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
1644
72.7%
0242
 
27.3%

Most occurring characters

ValueCountFrequency (%)
1644
72.7%
0242
 
27.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number886
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1644
72.7%
0242
 
27.3%

Most occurring scripts

ValueCountFrequency (%)
Common886
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1644
72.7%
0242
 
27.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII886
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1644
72.7%
0242
 
27.3%

Sex_male
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size7.0 KiB
1
575 
0
311 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters886
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row0
3rd row0
4th row0
5th row1

Common Values

ValueCountFrequency (%)
1575
64.9%
0311
35.1%

Length

2022-06-19T11:27:34.437204image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-06-19T11:27:34.656452image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
1575
64.9%
0311
35.1%

Most occurring characters

ValueCountFrequency (%)
1575
64.9%
0311
35.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number886
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1575
64.9%
0311
35.1%

Most occurring scripts

ValueCountFrequency (%)
Common886
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1575
64.9%
0311
35.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII886
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1575
64.9%
0311
35.1%

Interactions

2022-06-19T11:27:26.230900image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-19T11:27:18.026122image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-19T11:27:19.603794image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-19T11:27:21.295529image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-19T11:27:22.956893image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-19T11:27:24.645414image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-19T11:27:26.499070image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-19T11:27:18.296582image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-19T11:27:19.846475image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-19T11:27:21.595377image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-19T11:27:23.218950image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-19T11:27:24.912175image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-19T11:27:26.752295image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-19T11:27:18.531159image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-19T11:27:20.250290image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-19T11:27:21.858195image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-19T11:27:23.493578image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-19T11:27:25.152658image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-19T11:27:27.017141image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-19T11:27:18.805290image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-19T11:27:20.518814image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-19T11:27:22.138388image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-19T11:27:23.785926image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-19T11:27:25.404175image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-19T11:27:27.298872image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-19T11:27:19.078669image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-19T11:27:20.800141image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-19T11:27:22.420107image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-19T11:27:24.078347image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-19T11:27:25.690874image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-19T11:27:27.568338image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-19T11:27:19.328733image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-19T11:27:21.053481image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-19T11:27:22.700294image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-19T11:27:24.365119image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-19T11:27:25.964750image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Correlations

2022-06-19T11:27:34.839462image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-06-19T11:27:35.193102image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-06-19T11:27:35.520572image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-06-19T11:27:35.991261image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-06-19T11:27:36.417746image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-06-19T11:27:27.968527image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
A simple visualization of nullity by column.
2022-06-19T11:27:28.455309image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

df_indexPassengerIdSurvivedPclassAgeSibSpParchFareEmbarked_QEmbarked_SSex_male
00103-0.56136010-0.564532011
112110.669900100.992225000
22313-0.25354500-0.548122010
334110.439039100.550159010
445030.43903900-0.545083011
55603-0.09963800-0.535156101
667011.901160000.520073011
77803-2.10043531-0.228423011
88913-0.17659202-0.470123010
991012-1.17699010-0.009720000

Last rows

df_indexPassengerIdSurvivedPclassAgeSibSpParchFareEmbarked_QEmbarked_SSex_male
876881882030.28513100-0.548832011
87788288303-0.56136000-0.485113010
87888388402-0.09963800-0.485519011
87988488503-0.33049900-0.569394011
880885886030.74685405-0.032714100
88188688702-0.17659200-0.424740011
88288788811-0.79222200-0.011441010
88388888903-0.09963812-0.170683010
88488989011-0.25354500-0.011441001
885890891030.20817700-0.552376101